A Korean Homonym Disambiguation System Based on Statistical Model Using Weights

نویسندگان

Jun-Su Kim

Wang-Woo Lee

Chang-Hwan Kim

Cheol-Young Ock

چکیده

A homonym could be disambiguated by another words in the context as nouns, predicates used with the homonym. This paper using semantic information (co-occurrence data) obtained from definitions of part of speech (POS) tagged UMRD-S 1 ). In this research, we have analyzed the result of an experiment on a homonym disambiguation system based on statistical model, to which Bayes' theorem is applied, and suggested a model established of the weight of sense rate and the weight of distance to the adjacent words to improve the accuracy. The result of applying the homonym disambiguation system using semantic information to disambiguating homonyms appearing on the dictionary definition sentences showed average accuracy of 98.32% with regard to the most frequent 200 homonyms. We selected 49 (31 substantives and 18 predicates) out of the 200 homonyms that were used in the experiment, and performed an experiment on 50,703 sentences extracted from Sejong Project tagged corpus (i.e. a corpus of morphologically analyzed words) of 3.5 million words that includes one of the 49 homonyms. The result of experimenting by assigning the weight of sense rate(prior probability) and the weight of distance concerning the 5 words at the front/behind the homonym to be disambiguated showed better accuracy than disambiguation systems based on existing statistical models by 2.93%.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

TAKTAG: Two-phase learning method for hybrid statistical/rule-based part-of-speech disambiguation

Both statistical and rule-based approaches to part-of-speech (POS) disambiguation have their own advantages and limitations. Especially for Korean, the narrow windows provided by hidden markov model (HMM) cannot cover the necessary lexical and longdistance dependencies for POS disambiguation. On the other hand, the rule-based approaches are not accurate and flexible to new tag-sets and language...

متن کامل

Rule-based Approach to Korean Morphological Disambiguation Supported by Statistical Method

Korean as an agglutinative language shows its proper types of difficulties in morphological disambiguation, since a large number of its ambiguities comes from the stemming while most of ambiguities in French or English are related to the categorization of a morpheme. The current Korean morphological disambiguation systems adopt mainly statistical methods and some of them use rules in the postpr...

متن کامل

Disambiguation of Korean utterances using automatic intonation recognition

The paper describes a research on a use of intonation for disambiguating utterance types of Korean spoken sentences. Based on tilt intonation theory [8], two related but separate experiments were performed, both using the Hidden Markov Model training technique. In the first experiment, a system is established so that rough boundary positions of major intonation events are detected. Subsequently...

متن کامل

Word Sense Disambiguation In A Korean-To-Japanese MT System Using Neural Networks

This paper presents a method to resolve word sense ambiguity in a Korean-to-Japanese machine translation system using neural networks. The execution of our neural network model is based on the concept codes of a thesaurus. Most previous word sense disambiguation approaches based on neural networks have limitations due to their huge feature set size. By contrast, we reduce the number of features...

متن کامل

Resolving Sense Ambiguity of Korean Nouns Based on Concept Co-occurrence Information

From the view point of the linguistic typology, Korean and Japanese have many grammatical similarities which enable it to easily construct a sense-tagged Korean corpus through an existing high-quality Japanese-to-Korean machine translation system. The sense-tagged corpus may serve as a knowledge source to extract useful clues for word sense disambiguation (WSD). This paper addresses a disambigu...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2002

A Korean Homonym Disambiguation System Based on Statistical Model Using Weights

نویسندگان

چکیده

منابع مشابه

TAKTAG: Two-phase learning method for hybrid statistical/rule-based part-of-speech disambiguation

Rule-based Approach to Korean Morphological Disambiguation Supported by Statistical Method

Disambiguation of Korean utterances using automatic intonation recognition

Word Sense Disambiguation In A Korean-To-Japanese MT System Using Neural Networks

Resolving Sense Ambiguity of Korean Nouns Based on Concept Co-occurrence Information

عنوان ژورنال:

اشتراک گذاری